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Abstract 

We solve Poisson's equation using new multigrid algorithms that 
converge rapidly. The novel feature of the 2D and 3D algorithms are 
the use of extra diagonal grids in the multigrid hierarchy for a much 
richer and effective communication between the levels of the multigrid. 
Numerical experiments solving Poisson's equation in the unit square 
and unit cube show simple versions of the proposed algorithms are up 
to twice as fast as correspondingly simple multigrid iterations on a 
conventional hierarchy of grids. 
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1 Introduction 



Multigrid algorithms are effective in the solution of elliptic problems and 
have found many applications, especially in fluid mechanics || e.g.], chemical 
reactions in flows || e.g.] and flows in porous media 0. Typically, errors in 
a solution may decrease by a factor of 0.1 each iteration |5|, e.g.]. The simple 
algorithms I present decrease errors by a factor of 0.05 (see Tables [I] and ^ 
on pages 8 and 18). Further gains in the rate of convergence may be made 
by further research. 

Conventional multigrid algorithms use a hierarchy of grids whose grid 
spacings are all proportional to 2~ e where t is the level of the grid || e.g.]. 
The promising possibility I report on here is the use of a richer hierarchy 
of grids with levels of the grids oriented diagonally to other levels. Specifi- 
cally, in 2D I introduce in Section ^| a hierarchy of grids with grid spacings 
proportional to 2~ e ^ 2 and with grids aligned at 45° to adjacent levels, see 
Figure [l] (p§).Q In 3D the geometry of the grids is much more complicated. 
In Section |3| we introduce and analyse a hierarchy of 3D grids with grid spac- 
ings roughly 2~ e / 3 on the different levels, see Figure f| (p|lO|). Now Laplace's 
operator is isotropic so that its discretisation is straightforward on these di- 
agonally oriented grids. Thus in this initial work I explore only the solution 
of Poisson's equation. 

V 2 u = f. (1) 

Given an approximation u to a solution, each complete iteration of a multigrid 
scheme seek a correction v so that u — u + v is a better approximation 

lr This paper is best viewed and printed in colour as the grid diagrams and the ensuing 
discussions are all colour coded. 
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to a solution of Poisson's equation ([]]). Consequently the update v has to 
approximately satisfy a Poisson's equation itself, namely 

V 2 f = r , where r = f — V 2 £t , (2) 

is the residual of the current approximation. The multigrid algorithms aim to 
estimate the error v as accurately as possible from the residual r. Accuracy in 
the ultimate solution u is determined by the accuracy of the spatial discreti- 
sation in the computation of the residual r: here we investigate second-order 
and fourth-order accurate discretisations || e.g.] but so far only find remark- 
ably rapid convergence for second-order discretisations. 

The diagonal grids employed here are perhaps an alternative to the semi- 
coarsening hierarchy of multigrids used by Dendy Jy in more difficult prob- 
lems. 

In this initial research we only examine the simplest reasonable V-cycle 
on the special hierarchy of grids and use only one Jacobi iteration on each 
grid. We find in Sections |2.1| and |3.1| that the smoothing restriction step from 
one grid to the coarser diagonally orientated grid is done quite simply. Yet 
the effective smoothing operator from one level to that a factor of 2 coarser, 
being the convolution of two or three intermediate steps, is relatively sophis- 
ticated. One saving in using these diagonally orientated grids is that there 
is no need to do any interpolation. Thus the transfer of information from 
a coarser to a finer grid only involves the simple Jacobi iterations described 



in Sections |2.2| and |3.2| . Performance is enhanced within this class of sim- 
ple multigrid algorithms by a little over relaxation in the Jacobi iteration 
as found in Sections |2]4] and |3.4| . The proposed multigrid algorithms are 
found to be up to twice as fast as comparably simple conventional multigrid 
algorithms. 



2 A diagonal multigrid for the 2D Poisson 
equation 

To approximately solve Poisson's equation (|2|) in two- dimensions we use a 
novel hierarchy of grids in the multigrid method. The length scales of the 
grid are 2 - ^ 2 . If the finest grid is aligned with the coordinate axes with grid 
spacing h say, the first coarser grid is at 45° with spacing \/2h, the second 
coarser is once again aligned with the axes and of spacing 2h, as shown in 
Figure [I], and so on for all other levels on the multigrid. In going from one 
level to the next coarser level the number of grid points halves. 
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Figure 1: three levels of grids in the 2D multigrid hierarchy: the dotted green 
grid is the finest, spacing h say; the dashed red grid is the next finest diagonal 
grid with spacing V2h; the solid blue grid is the coarsest shown grid with 
spacing 2h. Coarser levels of the multigrid follow the same pattern. 
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Figure 2: restriction stencils are simple weighted averages of neighbouring 
grid points on all levels of the grid. 

2.1 The smoothing restriction 

The restriction operator smoothing the residual from one grid to the next 
coarser grid is the same at all levels. It is simply a weighted average of the 
grid point and the four nearest neighbours on the finer grid as shown in 
Figure ^|. To restrict from a fine green grid to the diagonal red grid 



' :J 



ij'+l 



(3) 



whereas to restrict from a diagonal red grid to the coarser blue grid 



'•J 



- (V . + 



£ £ £ £ 

r i-l,j-l + r m,j-l + r i+lj+l + r i-lJ+l 



(4) 



Each of these restrictions takes 6 flops per grid element. Thus assuming the 
finest grid isnxn with N = n 2 grid points, the restriction to the next finer 
diagonal grid (red) takes approximately 3iV flops, the restriction to the next 
finer takes approximately 3N/2 flops, etc. Thus to restrict the residuals up 
£ = 2L levels to the coarsest grid spacing of H = 2 L h takes 



K r « 



6N 1 



4 L 



flops ~ 6 N flops . 



(5) 



In contrast a conventional nine point restriction operator from one level to 
another takes 11 flops per grid point, which then totals to approximately 
3|iV flops over the whole conventional multigrid hierarchy. This is somewhat 
better than the proposed scheme, but we make gains elsewhere. In restricting 
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Figure 3: the interpolation in a prolongation step is replaced by simply a 
"red-black" Jacobi iteration: (a) compute the new values at the red grid 
points, then refine the values at the blue points; (b) compute the new values 
at the green points, then refine those at the red points. 

from the green grid to the blue grid, via the diagonal red grid, the restriction 
operation is equivalent to a 17-point stencil with a much richer and more 
effective smoothing than the conventional 9-point stencil. 



2.2 The Jacobi prolongation 

One immediate saving is that there is no need to interpolate in the prolon- 
gation step from one level to the next finer level. For example, to prolongate 
from the blue grid to the finer diagonal red grid, shown in Figure 0(a), esti- 
mate the new value of v at the red grid points on level I by the red- Jacobi 
iteration 



1,3 



j+1 + V i-l,j+l 



(6) 



when the grid spacing on the red grid is \[2h. Then the values at the blue 
grid points are refined by the blue- Jacobi iteration 



v h 4 



i (- 2/i Mj + + <4u-i + + vi-u+i) • ( 7 ) 



A similar green-red Jacobi iteration will implicitly prolongate from the red 
grid to the finer green grid shown in Figure 0(b). These prolongation- 
iteration steps take 6 flops per grid point. Thus to go from the red to the 
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green grid takes 6iV flops. As each level of the grid has half as many grid 
points as the next finer, the total operation count for the prolongation over 
the hierarchy from grid spacing H = 2 L h is 

K p « 12N - — ) flops « 12N flops . (8) 

The simplest (bilinear) conventional interpolation direct from the blue 
grid to the green grid would take approximately 2N flops, to be followed by 
6N flops for a Jacobi iteration on the fine green grid (using simply v\ — 
and v 2 = 1). Over the whole hierarchy this takes approximately 10|iV flops. 
This is a little smaller than that proposed here, but the proposed diagonal 
method achieves virtually two Jacobi iterations instead of just one and so is 
more effective. 



2.3 The V-cycle converges rapidly 

Numerical experiments show that although the operation count of the pro- 
posed algorithm is a little higher than the simplest usual multigrid scheme, 
the speed of convergence is much better. The algorithm performs remarkably 
well on test problems such as those in Gupta et al @ . I report a quantitative 
comparison between the algorithms that show the diagonal scheme proposed 
here is about twice as fast. 

Both the diagonal and usual multigrid algorithms use 7N flops to compute 
the residuals on the finest grid. Thus the proposed method takes approxi- 
mately 25N flops per V-cycle of the multigrid iteration, although 17% more 
than the simplest conventional algorithm that takes 21 |iV flops, the conver- 
gence is much faster. Table |I| shows the rate of convergence po ~ 0.1 for this 
diagonal multigrid based algorithm. The data is determined using matlab's 
sparse eigenvalue routine to find the largest eigenvalue and hence the slowest 
decay on a 65 x 65 grid. This should be more accurate than limited analytical 
methods such as a bi-grid analysis |§ . Compared with correspondingly sim- 
ple schemes based upon the usual hierarchy of grids, the method proposed 
here takes much fewer iterations, even though each iteration is a little more 
expensive, and so should be about twice as fast. 

Fourth-order accurate solvers in space may be obtained using the above 
second-order accurate V-cycle as done by Iyengar & Goyal [|]]. The only 
necessary change is to compute the residual r in (||]) on the finest grid with 
a fourth-order accurate scheme, such as the compact "Mehrstellen" scheme 

r i,j = 12 + + + + foj-'d 
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Table 1: comparison of cost, in flops, and performance for various algorithms 
for solving Poisson's equation in two spatial dimensions. The column headed 
"per iter" shows the number of flops per iteration, whereas columns show- 
ing "per dig" are flops/ log 10 p and indicate the number of flops needed to 
compute each decimal digit of accuracy. The right-hand columns show the 
performance for the optimal over relaxation parameter p. 



algorithm 


per iter 


p per dig 


p p per dig 


diagonal, O (h 2 ) 
usual, O (h 2 ) 


25.07V 
21.37V 


.099 25.07V 
.340 45.5iV 


1.052 .052 19.57V 
1.121 .260 36.47V 


diagonal, O (h 4 ) 
usual, O (h 4 ) 


30.07V 
26.37V 


.333 62.87V 
.343 56.6iV 


1.200 .200 42.97V 
1.216 .216 39.47V 



- [ _20w ij + 4 ( u i,j-l + u i,j+l + u i-l,j + u i+l,j) 

+ Wi+lj+l + Mi-lj+l + + Wi+lj-l] • (9) 

Use the V-cycles described above to determine an approximate correction 
v to the field u based upon these more accurate residuals. The operation 
count is solely increased by the increased computation in the residual, from 
77V flops per iteration to 127V flops (the combination of / appearing on the 
right-hand side of (||) need not be computed each iteration). Numerical 
experiments summarised in Table [l] show that the multigrid methods still 
converge, but the diagonal method has lost its advantages. Thus fourth 
order accurate solutions to Poisson's equation are most quickly obtained 
by initially using the diagonal multigrid method applied to the second order 
accurate computation of residuals. Then use a few multigrid iterations based 
upon the fourth order residuals to refine the numerical solution. 

2.4 Optimise parameters of the V-cycle 

The multigrid iteration is improved by introducing a small amount of over 
relaxation. 

First we considered the multigrid method applied to the second-order 
accurate residuals. Numerical optimisation over a range of introduced pa- 
rameter values suggested that the simplest, most robust effective change was 
simply to introduce a parameter p into the Jacobi iterations (^-0) to become 

v tj = 4 (- 2 P h2r i,j + v i-i,j-i + v i+i,j-i + v i+i,j+i + v i-i,j+i) > (11) 
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on a diagonal red grid and similarly for a green grid. An optimal value 
of p was determined to be p = 1.052. The parameter p just increases the 
weight of the residuals at each level by about 5%. This simple change, which 
does not increase the operation count, improves the factor of convergence to 
p ~ 0.052, which decreases the necessary number of iterations to achieve a 
given accuracy. As Table [I] shows, this diagonal multigrid is still far better 
than the usual multigrid even with its optimal choice for over relaxation. 

Then we considered the multigrid method applied to the fourth-order 
accurate residuals. Numerical optimisation of the parameter p in (|10|-|TT|) 
suggests that significantly more relaxation is preferable, namely p ~ 1.20. 
With this one V-cycle of the multigrid method generally reduces the resid- 
uals by a factor p pa 0.200. This simple refinement reduces the number 
of iterations required by about one-third in converging to the fourth-order 
accurate solution. 

3 A diagonal multigrid for the 3D Poisson 
equation 

The hierarchy of grids we propose for solving Poisson's equation (0) in three- 
dimensions is significantly more complicated than that in two-dimensions. 
Figure f| shows the three steps between levels that will be taken to go from 
a fine standard grid (green) of spacing h, via two intermediate grids (red 
and magenta), to a coarser regular grid (blue) of spacing 2h. As we shall 
discuss below, there is some unevenness in the hierarchy that needs special 
treatment. 

3.1 The smoothing restriction steps 

The restriction operation in averaging the residuals from one grid to the next 
coarser grid is reasonably straightforward. 

• The nodes of the red grid are at the corners of the cube and the centre 
of each of the faces as seen in Figure |^. They each have six neighbours 
on the green grid so the natural restriction averaging of the residuals 
onto the red grid is 




(12) 
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Figure 4: one cell of an amalgam of four levels of the hierarchy of grids used 
to form the multigrid V-cycle in 3D: green is the finest grid shown; red is 
the next level coarser grid; magenta shows the next coarser grid; and the 
blue cube is the coarsest to be shown. This stereoscopic view is to be viewed 
cross-eyed as this seems to be more robust to changes of viewing scale. 

for (i,j,k) corresponding to the (red) corners and faces of the coarse 
(blue) grid. When the fine green grid is n x n x n so that there are 
N = n 3 unknowns on the fine green grid, this average takes 8 flops for 
each of the approximately N/2 red nodes. This operation count totals 
AN flops. 

Note that throughout this discussion of restriction from the green to 
blue grids via the red and magenta, we index variables using subscripts 
appropriate to the fine green grid. This also holds for the subsequent 
discussion of the prolongation from blue to green grids. 

• The nodes of the next coarser grid, magenta, are at the corners and 
centres of the cube as seen in Figure || Observe that the centre nodes 
of the magenta grid are not also nodes of the finer red grid; this causes 
some complications in the treatment of the two types of nodes. The 
magenta nodes at the corners are connected to twelve neighbours on 
the red grid so the natural average of the residuals is 

r i,j,k = 7^ {y^ r i,j,k + r i+l,j+l,k + r i+l,j-l,k + r i-l,j-l,k + r i-l,j+l,fc + 

+ r i+l,j,k+l + r i+l,j,k-l + r i-l,j,k-l + r i-l,j,fc+l + 

+ r i,j+l,k+l + r i,j+l,k-l + r i,j-l,k-l + r i,j-l,k+l) ) (13) 

for (i,j,k) corresponding to the magenta corner nodes. This average 
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Figure 5: the green and red grids superimposed showing the nodes of the red 
grid at the corners and faces of the cube, and their relationship to their six 
neighbouring nodes on the finer green grid. 
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takes 14 flops for each of N/8 nodes. The magenta nodes at the centre 
of the coarse (blue) cube is not connected to red nodes by the red grid, 
see Figure |6|. However, it has six red nodes in close proximity, those at 
the centre of the faces, so the natural average is 



(\ ( r *+l,J,fc + T i-l,j,k + r i,j+l,k + r i,j-l,k + r i,j,k+l + r i,j,k-l) ) 

(14) 

for (i, j, k) corresponding to the magenta centre nodes. This averaging 
takes 6 flops for each of N/8 nodes. The operation count for all of this 
restriction step from red to magenta is flops. 

• The nodes of the coarse blue grid are at the corners of the shown 
cube, see Figure ^. On the magenta grid they are connected to eight 
neighbours, one for each octant, so the natural average of residuals 
from the magenta to the blue grid is 

r i,j,k = Yg (^ r *>j> fc r i+l,i+l,fc+l + r i+l,j+l,k-l + r i+l,j-l,fe+l + 
+ r i+l,i-l,fe-l + r i-l,j'+l,fc+l + r i-l,j+l,fc-l + 
+ r i-lJ-l,k+X + r i-lj'-l,fe-l) ' (15) 

for (i,j,k) corresponding to the blue corner nodes. This averaging 
takes 10 flops for each of -/V/8 blue nodes which thus totals l|7V flops. 

These three restriction steps, to go up three levels of grids, thus total ap- 
proximately 7^N flops. Hence, the entire restriction process, averaging the 
residuals, from a finest grid of spacing h up 3L levels to the coarsest grid of 
spacing H = 2 L h takes 

K r w —N (l - —) flops w 8fiV flops. (16) 



7 



The simplest standard one-step restriction direct from the fine green grid to 
the blue grid takes approximately 3|iV flops. Over the whole hierarchy this 
totals 4|iV flops which is roughly half that of the proposed method. We an- 
ticipate that rapid convergence of the V-cycle makes the increase worthwhile. 



3.2 The Jacobi prolongation steps 

As in 2D, with this rich structure of grids we have no need to interpolate when 
prolongating from a coarse grid onto a finer grid; an appropriate "red-black" 
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Figure 7: the magenta and blue grids superimposed showing the common 
nodes at the corners of the blue grid and the connections to the magenta 
centre node. 
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Jacobi iteration of the residual equation (0) avoids interpolation. Given an 
estimate of corrections v\ ■ k at some blue level grid we proceed to the finer 
green grid via the following three prolongation steps. 

• Perform a magenta-blue Jacobi iteration on the nodes of the magenta 
grid shown in Figure [7]. See that each node on the magenta grid is 
connected to eight neighbours distributed symmetrically about it, each 
contributes to an estimate of the Laplacian at the node. Thus, given 
initial approximations on the blue nodes from the coarser blue grid, 



- (-4p m h< 



v i,j,k ~ o ( 4p m /i r ij)fc + v i+1 j +l k+1 + v i+1 j +1 ^ k _ 1 + v i+l -_ lk+l - 



u i+l,j-l,k-l ' u i-l,j+l,k+l ~r u i-l,j+l,k-l 



+ vt' k+1 + vil' k _ 1 ) , (17) 



for (i,j,k) on the centre magenta nodes. The following blue- Jacobi 
iteration uses these updated values in the similar formula 



+ + J , (18) 

for (z, j, fc) on the corner blue nodes. In these formulae the over relax- 
ation parameter p m has been introduced for later fine tuning; initially 
take p m = 1. The operation count for this magenta-blue Jacobi itera- 
tion is 10 flops on each of N/4 nodes giving a total of 2|iV flops. 

Perform a red-magenta Jacobi iteration on the nodes of the red grid 
shown in Figure ^. However, because the centre node (magenta) is 
not on the red grid, two features follow: it is not updated in this 
prolongation step; and it introduces a little asymmetry into the weights 
used for values at the nodes. The red nodes in the middle of each face 
are surrounded by four magenta nodes at the corners and two magenta 
nodes at the centres of the cube. However, the nodes at the centres are 
closer and so have twice the weight in the estimate of the Laplacian. 
Hence, given initial approximations on the magenta nodes from the 
coarser grid, 



v 



I {-2p rl h 2 r[ hk + 2 vl 3 / k+1 + v\^_ x 



.£-1 i ..t-1 



+ 



+ v i+l,j+l,k + v i+l,j-l,k + v i-l,j-l,k + v i-l,j+l,k) ' (^) 
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for k) corresponding to the red nodes on the centre of faces nor- 
mal to the ^-direction. Similar formulae apply for red nodes on other 
faces, cyclically permute the role of the indices. The over relaxation 
parameters p r \ and p r 2 are introduced for later fine tuning; initially 
take p r i = p r 2 = 1. The following magenta- Jacobi iteration uses these 
updated values. Each magenta corner node in Figure || is connected to 
twelve red nodes and so is updated according to 

v Lk = ^(-^Pr2h 2 rl jtk + 

+ V i+l,j+l,k + V i+l,j-l,k + V i-l,j-l,k + u i-l,j+l,fc+ 
+ V i+l,j,k+l + V i+l,j,k-l + V i-l,j,k-l + V i-l,j,k+l^~ 
+ V i,j+l,k+l + V i,j+l,k-l + V i,j-l,k-l + V i,j-l,k+l) > (20) 

for all (i,j,k) corresponding to corner magenta nodes. The operation 
count for this red- magenta Jacobi iteration is 9 flops on each of 3N/ 8 
nodes and 14 flops on each of iV/8 nodes. These total 5|iV flops. 

• Perform a green-red Jacobi iteration on the nodes of the fine green grid 
shown in Figure The green grid is a standard rectangular grid so the 
Jacobi iteration is also standard. Given initial approximations on the 
red nodes from the coarser red grid, 

v i,j,k = q (~Pgh 2r i,j,k + v i+l,j,k + v i-lj,k + v i,j+l,k + v i,j-l,k+ 

+ + , (21) 

for (i,j,k) corresponding to the green nodes (edges and centre of the 
cube). The over relaxation parameter p g , initially p g = 1, is introduced 
for later fine tuning. The red- Jacobi iteration uses these updated values 
in the similar formula 

V i,j,k = q (y~Pgh 2r i.j,k + v i+l,j,k + v i-lj,k + v i,j+l,k + v i,j-l,k Jr 

+ 4i,k+i + v Lk-i) , (22) 

for the red nodes in Figure g. This prolongation step is a standard 
Jacobi iteration and takes 8 flops on each of N nodes for a total of 
8N flops. 

These three prolongation steps together thus total 15|iV flops. To prolongate 
over £ = 3L levels from the coarsest grid of spacing H = 2 L h to the finest 
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Table 2: comparison of cost, in flops, and performance for unoptimised al- 
gorithms for solving Poisson's equation in three spatial dimensions on a 17 3 
grid. The column headed "per iter" shows the number of flops per iteration, 
whereas column showing "per dig" is flops/ log 10 po and indicates the number 
of flops needed to compute each decimal digit of accuracy. 



algorithm 


per iter 


po per dig 


diagonal, O (h 2 ) 
usual, O (h 2 ) 


35. 77V 
26.17V 


0.140 42iV 
0.477 817V 


diagonal, O (h 4 ) 
usual, O (/i 4 ) 


48.7iV 
39.17V 


0.659 2697V 
0.651 2107V 



grid thus takes 

K p « i^TV (l - -L) flops « 17f TV flops. (23) 

The simplest trilinear interpolation direct from the blue grid to the green grid 
would take approximately 3|7V flops, to be followed by 87V flops for a Jacobi 
iteration on the fine green grid. Over the whole hierarchy this standard 
prolongation takes approximately 12|7V flops. This total is smaller, but the 
proposed diagonal grid achieves virtually three Jacobi iterations instead of 
one and so is more effective. 



3.3 The V-cycle converges well 

Numerical experiments show that, as in 2D, although the operation count of 
the proposed algorithm is a little higher, the speed of convergence is much 
better. Both algorithms use 97V flops to compute second-order accurate resid- 
uals on the finest grid. Thus the proposed method takes approximately 
35|7V flops for one V-cycle, some 37% more than the 26|7V flops of the sim- 
plest standard algorithm. It achieves a mean factor of convergence p ~ 0.140. 
This rapid rate of convergence easily compensates for the small increase in 
computations taking half the number of flops per decimal digit accuracy 
determined. 

As in 2D, fourth-order accurate solvers may be obtained simply by using 
the above second-order accurate V-cycle on the fourth-order accurate resid- 
uals evaluated on the finest grid. A compact fourth-order accurate scheme 
for the residuals is the 19 point formula 

r hj,k = V*fi,j,k fi+l,j,k + fi,j+l,k + fi-l,j,k + fi,j—l,k + fi,j,k+l J T 
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Table 3: comparison of cost, in flops, and performance for optimised algo- 
rithms for solving Poisson's equation in three spatial dimensions on a 17 3 
grid varying over relaxation parameters to determine the best rate of conver- 
gence. The column headed "per iter" shows the number of flops per iteration, 
whereas column showing "per dig" is flops/ log 10 p and indicates the number 
of flops needed to compute each decimal digit of accuracy. 



algorithm 


per iter 


Pm Prl Pr2 Pg P P^Y dig 


diag, O (h 2 ) 
usual, 0(h 2 ) 


35.7iV 
26.1N 


1.11 1.42 1.08 0.99 0.043 26N 
1.30 0.31 51JV 


diag, O (h 4 ) 
usual, O (h 4 ) 


48. 7N 
39.17V 


0.91 0.80 0.70 1.77 0.39 1197V 
1.70 0.41 1017V 



+ fi,j,k-l) — [~^ U i,j,k + 2 (/Uij-l,fc + Wjj+l,fc + Uj-l J j,fc+ 

+ u i+l,j,k + Ui,j,k+1 + Ui,j,k-l) + U i+ ij + i t k + Ui-ij + i± + 

+ u i-l,j-l,k + u i+l,j-l,k + Ui,j+l,k+l + u i,j+l,k-l + u i,j-l,k-l + 

+ Wjj-i^+l + + Wj-l,j,fc+l + + Mi+l,i,fc-l](24) 

Then using the V-cycle described above to determine corrections v to the 
field u leads to an increase in the operation count of 13iV flops solely from 
the extra computation in finding the finest residuals. Numerical experiments 
show that the multigrid iteration still converge, albeit slower, with p m 0.659. 
Table ^ shows that the rate of convergence on the diagonal hierarchy of grids 
is little different than that for the simplest usual multigrid algorithm. As in 
2D, high accuracy, 4th order solutions to Poisson's equation are best found 
by employing a first stage that finds 2nd order accurate solutions which are 
then refined in a second stage. 



3.4 Optimise parameters of the V-cycle 

As in 2D, the multigrid algorithms are improved by introducing some relax- 
ation in the Jacobi iterations. The four parameters p m , p r i, p r 2 and p g were 



introduced in the Jacobi iterations (|17]-|22|) to do this, values bigger than 1 
correspond to some over relaxation. 

The search for the optimum parameter set used the Nelder-Mead simplex 
method encoded in the procedure fmins in matlab. Searches were started 
from optimum parameters found for coarser grids. As tabulated in Table § 
the optimum parameters on a 17 3 gridQ were p m = 1.11, p rl = 1.42, p r2 = 

Systematic searches on a finer grid were infeasible within one days computer time 
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1.08 and p g = 0.99 and achieve an astonishingly fast rate of convergence of 
p ~ 0.043. This ensures convergence to a specified precision at half the cost 
of the similarly optimised, simple conventional multigrid algorithm. 

For the fourth-order accurate residuals an optimised diagonal multigrid 
performs similarly to the optimised conventional multigrid with a rate of 
convergence of p ~ 0.39. Again fourth order accuracy is best obtained after 
an initial stage in which second order accuracy is used. 

4 Conclusion 

The use of a hierarchy of grids at angles to each other can halve the cost of 
solving Poisson's equation to second order accuracy in grid spacing. Each 
iteration of the optimised simplest multigrid algorithm decreases errors by 
a factor of at least 20. This is true in both two and three dimensional 
problems. Further research is needed to investigate the effective of extra 
Jacobi iterations at each level of the diagonal grid. 

When compared with the amazingly rapid convergence obtained for the 
second order scheme, the rate of convergence when using the fourth order 
residuals is relatively pedestrian. This suggests that a multigrid V-cycle 
specifically tailored on these diagonal grids for the fourth order accurate 
problem may improve convergence markedly. 

There is more scope for W-cycles to be effective using these diagonal 
grids because there are many more levels in the multigrid hierarchy. An 
exploration of this aspect of the algorithm is also left for further research. 
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